AI-Generated Information for Vascular Patients: Assessing the Standard of Procedure-Specific Information Provided by the ChatGPT AI-Language Model

Introduction Ensuring access to high-quality information is paramount to facilitating informed surgical decision-making. The use of the internet to access health-related information is increasing, along with the growing prevalence of AI language models such as ChatGPT. We aim to assess the standard of AI-generated patient-facing information through a qualitative analysis of its readability and quality. Materials and methods We performed a retrospective qualitative analysis of information regarding three common vascular procedures: endovascular aortic repair (EVAR), endovenous laser ablation (EVLA), and femoro-popliteal bypass (FPBP). The ChatGPT responses were compared to patient information leaflets provided by the vascular charity, Circulation Foundation UK. Readability was assessed using four readability scores: the Flesch-Kincaid reading ease (FKRE) score, the Flesch-Kincaid grade level (FKGL), the Gunning fog score (GFS), and the simple measure of gobbledygook (SMOG) index. Quality was assessed using the DISCERN tool by two independent assessors. Results The mean FKRE score was 33.3, compared to 59.1 for the information provided by the Circulation Foundation (SD=14.5, p=0.025) indicating poor readability of AI-generated information. The FFKGL indicated that the expected grade of students likely to read and understand ChatGPT responses was consistently higher than compared to information leaflets at 12.7 vs. 9.4 (SD=1.9, p=0.002). Two metrics measure readability in terms of the number of years of education required to understand a piece of writing: the GFS and SMOG. Both scores indicated that AI-generated answers were less accessible. The GFS for ChatGPT-provided information was 16.7 years versus 12.8 years for the leaflets (SD=2.2, p=0.002) and the SMOG index scores were 12.2 and 9.4 years for ChatGPT and the patient information leaflets, respectively (SD=1.7, p=0.001). The DISCERN scores were consistently higher in human-generated patient information leaflets compared to AI-generated information across all procedures; the mean score for the information provided by ChatGPT was 50.3 vs. 56.0 for the Circulation Foundation information leaflets (SD=3.38, p<0.001). Conclusion We concluded that AI-generated information about vascular surgical procedures is currently poor in both the readability of text and the quality of information. Patients should be directed to reputable, human-generated information sources from trusted professional bodies to supplement direct education from the clinician during the pre-procedure consultation process.


Introduction
As we progress through the 'information age', there is an increasing number of sources through which patients can learn about their pathology and the treatment options available to them.Despite this volume increasing, there can be no assurances as to its quality.Large variances exist in both the form and content of patient-facing educational material [1].Ensuring access to high-quality information is paramount to facilitating informed surgical decision-making for patients undergoing vascular procedures.

Artificial intelligence
One increasingly popular method of gathering information is through the utilization of artificial intelligence (AI).ChatGPT is an AI large language model (LLM) created by OpenAI (San Francisco, CA, USA) [2], which has been trained with information input from a wide variety of publicly available sources, including books, articles, and websites.Once a query is inputted, ChatGPT uses its language model to organize this data into a coherent, grammatical, and contextually correct answer.Since its release in November 2022, it has been reported to have received over 1.5 billion page visits [3].The use of AI is established in many consumerfacing roles, i.e., customer support or virtual assistance, and is increasingly being deployed in the healthcare sector.In 2023, the UK government announced £100 million into the AI Life Sciences Accelerator Mission to utilize AI in the National Health Service [4].
Research has been completed on the possible uses of ChatGPT for the vascular clinician.In a comprehensive literature review, Fischer et al. explored the uses of AI and machine learning in vascular diagnostics, perioperative risk stratification, and outcome prediction [5].The technology is still relatively nascent in these fields, requiring further clinical validation and integration into healthcare systems.Our search found no literature on the use of AI in patient health information acquisition.

Patient education
The ultimate responsibility for ensuring adequately informed patient decision-making lies with the consenting clinician per the Royal College of Surgeons of England's (RCSE) Good Surgical Practice guidelines on consent [6].Supplementary materials have long been used to facilitate patient education, namely information leaflets, educational videos, and increasingly, webpages and internet resources.Their use is explicitly endorsed in the RCSE guidelines.Artificial intelligence language models also have the potential to educate, whether the patient is directed there by the clinician or not.
The quality of information available on the internet to educate vascular patients was first investigated in 1999 by Soot et al., who analyzed the 50 most common websites encountered when searching for common vascular pathologies on search engines.They remarked that "overall quality is poor, and information is difficult to obtain in part because of the large number of irrelevant sites" [7].The question was revisited in 2012 by Grewal et al., who repeated a similar study, analyzing common websites encountered by major search engines [8].They analyzed readability through the FKRE score, the Gunning fog index, and the LIDA tool to assess reliability; they concluded that "internet information on vascular surgical conditions and procedures is poorly written and unreliable," indicating little progression 13 years on.As online resources have developed a decade on from Grewal et al.'s study, and with the recent introduction of AI language models claiming to be able to present this growing information pool in a readable and conversational manner, the question of the internet's ability to inform the vascular patient population is worth revisiting.

Materials And Methods
We performed a retrospective qualitative assessment of documents intended for patient use, comparing those generated by AI and those produced by human writing.The Circulation Foundation is a UK-registered charity (Charity Number 1102769) and a division of the United Kingdom's Vascular Society [9].Patient information leaflets were accessed in July 2023 regarding three commonly performed vascular procedures covering domains of aortic, peripheral artery, and venous pathology.These were endovascular aortic repair (EVAR) [10], femoro-popliteal bypass (FPBP) [11], and endovenous laser therapy (EVLA) [12].ChatGPT, the AI LLM developed by OpenAI using the GPT-3.5 framework [2], was also accessed in the same month, i.e., July 2023.The input questions were as follows: "I am a patient who has been offered [Vascular Procedure].What should I know before agreeing to have this procedure?"Responses were then extracted (see Appendix A-C) and assessed for readability and quality.

Assessments of standard
Readability was assessed using the FKRE score, FKGL, GFS, and the SMOG index.Scores were derived from the WebFX (Harrisburg, PA, USA) online readability test [13].Data is presented as raw values and means.The FKRE score is given on a scale of 0-100, with higher scores signifying better readability.Score creators recommend a target score of 65 or higher.The FKGL is an estimation of the education grade (as per the American education system) required to understand a piece of writing.A lower score indicates more readily accessible information.A recommended target for general audiences is 7th grade or lower.The GFS and SMOG are both measures referring to the years of education required to comprehend material.Once again, a lower score indicates better readability.
Quality assessments were performed using the DISCERN tool by two independent assessors (AJ and PM).Scores are presented as raw scores out of 80, and means were also calculated.Inter-rater reliability was assessed using Cronbach's alpha score of reliability.
Normality was assessed using the Kolmogorov-Smirnov test of normality (see Appendix D).The data was found to be normally distributed, so parametric tests were used.Graphing and statistical analysis were performed in Microsoft Excel (Microsoft Corp., Redmond, WA, USA).Readability metrics for AI-generated answers were low across all procedures (Table 1).The mean FKRE score was 33.3, compared to 59.1 for the information provided by the Circulation Foundation (SD=14.5, p=0.025).

Results
No information surpassed the benchmark score of 65 as detailed by the metric creators as a good level for the general population.The FKGL indicated that the expected grade of students likely to read and understand ChatGPT responses was consistently higher than compared to information leaflets at 12.7 vs. 9.4 (SD=1.9,p=0.002).Two metrics that measured readability in terms of the number of years of education required to understand a piece of writing were the GFS and SMOG.Both scores indicated that AI-generated answers were less accessible.The GFS for ChatGPT-provided information was 16.7 years vs. 12.8 years for the leaflets (SD=2.2,p=0.002); and SMOG index scores were 12.2 and 9.4 years for ChatGPT and patient information leaflets, respectively (SD=1.7,p=0.001).The content of AI-generated information was of lower quality globally.The Cronbach alpha inter-observer reliability score was 0.9795, indicating excellent reliability of DISCERN scores between assessors.The DISCERN scores were consistently higher in human-generated patient information leaflets compared to AIgenerated information across all procedures (

Discussion
Almost a quarter century after Soot et al.'s initial investigation of internet resources for vascular patient education, our analysis found that, despite the introduction of AI boasting accurate and conversational information, there is much progress needed before it can be endorsed for this purpose.When compared to the benchmark of information written by humans and tailored to vascular patient audiences, ChatGPT responses were inferior both in readability and quality of information.Clinicians should continue to direct patients to these resources and remember that the ultimate responsibility for educating patients during the consenting process lies with them.

Material risks
Beyond discussions regarding the quality of information provided by AI, it is important at this stage to recall the landmark case of Montgomery vs. Lanarkshire Health Board [14], which defined a duty of care on clinicians to make patients aware of all material risks.The case defined materiality as "a reasonable person in the patient's position would be likely to attach significance to the risk, or the doctor is or should reasonably be aware that the particular patient would be likely to attach significance to it" [14].Focusing on the latter half of this statement, we are reminded that direct clinician input is required to tailor all education to the particular situation of the patient.The distribution of informational resources is no replacement for direct clinician input, but they are there to be used as a reminder and supplemental reading for the patient.

Guidelines for use
The RCSE details guidelines on consent in Section 3.5.1 of Good Surgical Practice [6].It states: "Where possible, you should provide written information to patients to enable them to reflect and confirm their decision.You should also provide advice on how they can obtain further information to understand the procedure and their condition.This can include information such as patient leaflets, decision aids, websites and educational videos." Our study recommends that, while adhering to these guidelines, vascular surgeons should not be directing patients to AI LLMs for further information.As detailed by the RCSE, patient leaflets from reputable professional bodies should remain the standard for supplementary patient education.Currently, no guidelines are readily available for the use of AI in surgery from any UK professional surgical body; however, these issues span beyond just the field of vascular surgery.Similar poor outcomes for readability were found in Momenaei et al.'s study assessing AI-generated information on retinal disease [15] and Musheyev et al.'s study regarding urological malignancies [16].
Interestingly, both studies found the information to be accurate and appropriate in content, which was not reproduced in our findings in vascular surgery.This discrepancy indicates potential specialty-dependent inconsistencies in the quality of data used to inform answers.We recommend further investigation into the current use of AI by clinicians and patients across specialties and to dissuade their further use until the quality of the information can be reliably and reproducibly assured.Additionally, given that this difference exists between surgical specialties, it may also exist between subjects within the same field.Further research is required to determine inter-and intra-specialty differences in the accuracy of AI-generated information.

Quality assurance
The topic of data quality from AI was discussed at the National Institute of Health workshop held in 2019 [17].The panel concluded that the trustworthiness of AI in healthcare is currently impeded by data quality; this finding is reproduced in our analysis.If AI is to be integrated into the domain of patient education, further clinical input should be sought to define the data sets that AI uses to source information as well as quality assurance for patient-facing output.We echo the call from Park et al. for further clinical validation through a study into the current state of AI in healthcare [18].Just as with any tool or intervention, research must be conducted to determine the validity of AI-generated information before its endorsed integration into health systems, and repeated quality assurance must be undertaken as these systems evolve.

Limitations
This study utilized ChatGPT based on the GPT-3.5 framework.OpenAI has released GPT-4 and integrated a newer iteration.However, this is restricted behind a paywall of $20 per month as part of ChatGPT Plus [19].OpenAI reports that GPT-4 is "40% more likely to produce factual responses than GPT-3.5 on our internal evaluations" [19].We decided to use the free-to-use version that is more readily accessible to patients and therefore more likely to represent the experience of patients turning to LLM's to research proposed vascular procedures.Similarly, within the same iteration of ChatGPT, answers may contain slight differences when asked on different occasions.Scores may have differed if queries were requested at different times.
The formulae of the four readability metrics used commonly rely on the same two parameters in their assessments, i.e., syllables per word and words per sentence.While providing an objective metric that is useful to facilitate direct comparison, it cannot always be assumed that shorter words are simpler to comprehend.This highlights an important distinction to note between the readability and understandability of language [20].Outside of formulaic approaches, eye-movement tracking is an alternative proposed method of readability assessment [21] and is an exciting avenue for potential further study to assess the reproducibility of findings.

Conclusions
Artificial intelligence-generated information was found to be comparatively poor in both readability and quality when compared to existing educational material from human experts.Vascular clinicians should favor patient information leaflets from reputable professional institutions when directing patients to supplementary educational material.Further regulation is required to standardize the data input into AI language models and audit the quality of patient-facing output before it can be utilized safely and effectively as an educational resource.
4. Preoperative preparation: Prior to the surgery, your healthcare team will guide you through preoperative instructions.This may include fasting for a certain period before the procedure, stopping certain medications or supplements, and providing information about any allergies or medical conditions you may have.
5. Anesthesia: Femoro-popliteal bypass surgery is usually performed under general anesthesia, meaning you will be unconscious during the procedure.An anesthesiologist will be responsible for administering and monitoring the anesthesia to ensure your safety.
6. Hospital stay and recovery: Following the surgery, you will typically be required to stay in the hospital for a few days.Your healthcare team will monitor your progress, manage pain, and provide postoperative care instructions.You may need to wear compression stockings and take medication to prevent blood clots.A rehabilitation program might be recommended to aid your recovery and restore strength and mobility.
7. Long-term follow-up: After the surgery, regular follow-up visits with your healthcare provider will be necessary to monitor your progress and assess the success of the bypass.It's essential to attend these appointments and report any concerns or changes in your symptoms.
8. Lifestyle changes: Femoro-popliteal bypass surgery is not a cure for the underlying condition.To maintain the benefits of the procedure and reduce the risk of future complications, lifestyle modifications are crucial.This may involve quitting smoking, adopting a healthy diet, exercising regularly, and managing other risk factors such as high blood pressure, diabetes, or high cholesterol.
It is important to have a thorough discussion with your healthcare provider about the procedure, your specific condition, and any concerns you may have.They will be able to provide personalized information and guidance to help you make an informed decision about whether femoro-popliteal bypass surgery is the right option for you.

Table 2 )
. Scores are out of a possible 80.The mean score for the information provided by ChatGPT was 50.3 vs. 56.0 for the Circulation Foundation information leaflets (SD=3.38,p<0.001).

TABLE 3 : Tests of normality using the Kolmogorov-Smirnov test of normality KS
: Kolmogorov-Smirnov test statistic, (D): Lower numbers indicating a greater Gaussian distribution pattern FKRE: Flesch-Kincaid reading ease, FKGL: Flesch-Kincaid grade level, GFS: Gunning fog score, SMOG: Simple measure of gobbledygook